Characterization of Caulimovirid-like Sequences from Upland Cotton (Gossypium hirsutum L.) Exhibiting Terminal Abortion in Georgia, USA

In this study, we investigated the potential involvement of endogenous viral elements (EVEs) in the development of apical tissue necrosis, resulting in the terminal abortion of upland cotton (Gossypium hirsutum L.) in Georgia. The high-throughput sequence analysis of symptomatic and asymptomatic plant tissue samples revealed near-complete EVE-Georgia (EVE-GA) sequences closely related to caulimoviruses. The analysis of EVE-GA’s putative open reading frames (ORFs) compared to cotton virus A and endogenous cotton pararetroviral elements (eCPRVE) revealed their similarity in putative ORFs 1–4. However, in the ORF 5 and ORF 6 encoding putative coat protein and reverse transcriptase, respectively, the sequences from EVE-GA have stop codons similar to eCPRVE sequences from Mississippi. In silico mining of the cotton genome database using EVE-GA as a query uncovered near-complete viral sequence insertions in the genomes of G. hirsutum species (~7 kb) but partial in G. tomentosum (~5.3 kb) and G. mustelinum (~5.1 kb) species. Furthermore, cotton EVEs’ episomal forms and messenger RNA (mRNA) transcripts were detected in both symptomatic and asymptomatic plants collected from cotton fields. No significant yield difference was observed between symptomatic and asymptomatic plants of the two varieties evaluated in the experimental plot. Additionally, EVEs were also detected in cotton seeds and seedlings. This study emphasizes the need for future research on EVE sequences, their coding capacity, and any potential role in host immunity or pathogenicity.

Initially, the early infestation of thrips on emerging cotton led to the hypothesis that thrips transmitted TSV as a potential pathogen for necrosis and terminal abortion.Additionally, another suspected viral pathogen was cotton leafroll dwarf virus (CLRDV), which is prevalent in Georgia and other cotton-growing regions in the USA [47].Recent discoveries of endogenous cotton pararetroviral elements (eCPRVE) sequences [25] and a novel cotton virus A (CotV-A) [56] in cotton from Mississippi prompted us to investigate the presence of such viruses and viral elements in the field samples in GA.With the advancement of sequencing technologies, high-throughput sequencing (HTS) of small RNA [57,58] and lncRNA [25] is widely used for the detection of known and novel viruses without any prior knowledge [59].Further, this study aims (i) to characterize EVEs within the cotton genome using HTS and to conduct in silico assessment of cotton (Gossypium species) genomes for endogenous caulimovirid-like sequences using cotton genomic databases, namely Phytozome (https://phytozome-next.jgi.doe.gov)[Accessed on 25 May 2024] [60] and Cottongen (https://www.cottongen.org/)[Accessed on 25 May 2024] [61], (ii) to evaluate the presence of their episomal forms and mRNA transcripts of movement protein gene, in addition to investigating for unknown or cryptic viruses in the small RNA sequences [62,63] extracted from field samples.

Sample Collection
In early June of 2023, during the vegetative stage of the crop, cotton plants exhibiting terminal abortion (symptomatic) and plants devoid of such symptoms (asymptomatic) (n = 54) were collected from commercial cotton fields in two locations: the Sunbelt Agricultural Expo (n = 28) in Colquitt County and Hopeful (n = 26) in Mitchell County, GA.The tissue sample (petiole, leaf, and tissue near the vegetative branching) from each plant was combined and processed to diagnose the presence of potential viruses impacting cotton at the Virology lab, UGA Tifton, GA.Samples from Colquitt County were pooled into three subsamples: symptomatic S1 (n = 12), S2 (n = 12), and asymptomatic S3 (n = 4).Similarly, samples from Hopeful were pooled into three subsamples: symptomatic S4 (n = 11), S5 (n = 11), and asymptomatic S6 (n = 4), resulting in six composite samples.
An experimental plot was established at the UGA Bowen research farm in Tifton, GA, to evaluate the incidence of terminal abortion and its potential impact on yield.Two varieties of cotton, 'Dyna-Gro 3615 B3XF' and 'Dyna-Gro H959 B3XF' (DG3615 and DGH959, Loveland Products, Inc., Loveland, CO, USA), were planted in individual plots, which were 16 rows wide and 750 feet in length.The four middle rows were selected for each variety to evaluate the association of EVEs with the terminal abortion observed in the commercial fields, excluding the four border rows to avoid border effects.During the vegetative growth phase for both varieties, DG3615 and DGH959, a symptomatic plant was selected in each row and marked along with an adjacent asymptomatic plant.This was replicated five times in each row, and there were twenty replications for each variety.For DGH959 only, one adjacent asymptomatic plant was additionally selected, and the apical bud was manually terminated to induce terminal abortion.Twenty symptomatic and twenty asymptomatic plants were selected for each variety, and an additional twenty induced terminal abortion plants were selected in DGH959 only.A total of 100 individual plant samples were labeled/tagged and monitored for symptom development and progress.Yield data were hand-harvested from individual plants.

Statistical Analysis
For DG3615, symptomatic and asymptomatic plants are considered two treatments, and a paired student's t-test was performed to determine the effect of treatments on the seedcotton yield plant −1 and the boll density plant −1 .There were three treatments (symptomatic, asymptomatic, and induced terminal abortion plants) in DGH959, and a one-way mixed effects analysis of variance (ANOVA) was used to determine the effects of treatments on the same response variables.Significant effects of treatments were considered at p < 0.05.Statistical analysis was performed using JMP Pro version 16 (SAS Institute; Cary, NC, USA).

Seed and Seedling Assessment
Seeds and seedlings from four different commercial cotton varieties, Deltapine 1646 B2XF (DP1646, Bayer Crop Science, St. Louis, MO, USA), Stoneville 4595 B3XF (ST4595, BASF Corporation, Research Triangle Park, NC, USA), DG3615, and DGH959, were evaluated in both laboratory and greenhouse settings.This evaluation was aimed at ascertaining whether the EVE's expression was induced in field conditions due to abiotic factors or was also evident in controlled environments.Ten seeds from each variety were sown directly in soil (Pro-Mix Premier HP, peat-based growing medium) using 6-well trays of 1.5 ′′ square by 2.25 ′′ depth and were maintained inside the insect-free cages (BugDorm, 160 µm aperture, MegaView Science Co., Ltd.Taichung, Taiwan, China) in the greenhouse facility at the University of Georgia, Tifton, GA.The greenhouse was maintained at a temperature of 28 ± 3 • C and 50 ± 20% relative humidity throughout the experiment.Simultaneously, the other set of ten seeds from each variety were kept on water-soaked filter paper in a petri dish and incubated at room temperature under dark conditions to induce seed germination.All varieties except DP1646 were available pre-treated with fungicide.After 96 h, sprouted seeds and seedlings were divided into three parts for EVEs testing.The seed coat (testa) was separated, and the sprouted seed was divided into the upper shoot with plumule and epicotyl (P + E) and lower root parts, including the hypocotyl, and root (H + R) (Figure 1A).Meanwhile, the seedlings grown under greenhouse conditions were collected, and each seedling was separated into three different parts for EVE analysis (cotyledon leaves, an inch of stem including the meristem, and root parts) (Figure 1B).
For DG3615, symptomatic and asymptomatic plants are considered two treatments, and a paired student's t-test was performed to determine the effect of treatments on the seedcotton yield plant −1 and the boll density plant −1 .There were three treatments (symptomatic, asymptomatic, and induced terminal abortion plants) in DGH959, and a one-way mixed effects analysis of variance (ANOVA) was used to determine the effects of treatments on the same response variables.Significant effects of treatments were considered at p < 0.05.Statistical analysis was performed using JMP Pro version 16 (SAS Institute; Cary, NC, USA).

Seed and Seedling Assessment
Seeds and seedlings from four different commercial cotton varieties, Deltapine 1646 B2XF (DP1646, Bayer Crop Science, St. Louis, MO, USA), Stoneville 4595 B3XF (ST4595, BASF Corporation, Research Triangle Park, NC, USA), DG3615, and DGH959, were evaluated in both laboratory and greenhouse settings.This evaluation was aimed at ascertaining whether the EVE's expression was induced in field conditions due to abiotic factors or was also evident in controlled environments.Ten seeds from each variety were sown directly in soil (Pro-Mix Premier HP, peat-based growing medium) using 6-well trays of 1.5″ square by 2.25″ depth and were maintained inside the insect-free cages (BugDorm, 160 µm aperture, MegaView Science Co., Ltd.Taichung, Taiwan, China) in the greenhouse facility at the University of Georgia, Tifton, GA.The greenhouse was maintained at a temperature of 28 ± 3 °C and 50 ± 20% relative humidity throughout the experiment.Simultaneously, the other set of ten seeds from each variety were kept on water-soaked filter paper in a petri dish and incubated at room temperature under dark conditions to induce seed germination.All varieties except DP1646 were available pre-treated with fungicide.After 96 h, sprouted seeds and seedlings were divided into three parts for EVEs testing.The seed coat (testa) was separated, and the sprouted seed was divided into the upper shoot with plumule and epicotyl (P + E) and lower root parts, including the hypocotyl, and root (H + R) (Figure 1A).Meanwhile, the seedlings grown under greenhouse conditions were collected, and each seedling was separated into three different parts for EVE analysis (cotyledon leaves, an inch of stem including the meristem, and root parts) (Figure 1B).The total RNA was extracted from the six commercial composite field samples from Colquitt and Mitchell County using Spectrum TM Plant Total RNA extraction kit (Sigma-Aldrich, St Louis, MO, USA) following the manufacturer's protocol.The total DNA was extracted from the same set of samples using the DNeasy Plant Mini Kit (Qiagen, Germantown, MD, USA).Extracted RNA and DNA were used for PCR assays, and the total RNA was used for library preparation and high-throughput sequencing.
Additionally, total nucleic acid (TNA) was extracted using magnetic bead technology following the protocol as described in Adeleke et al. [63] from individual field samples collected from Colquitt (n = 28) and Mitchell (n = 26) counties, leaf petiole samples (n = 100) collected from UGA Bowen farm experimental plot, and seeds (n = 50) and seedlings (n = 50) from the greenhouse and laboratory.
The quality and quantity of the total nucleic acids were determined using a NanoDrop One UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).DNA, RNA, and TNA, with 260/280 absorbance of ≥1.8, were aliquoted for further analysis and stored at −80 • C.

Nucleic Acid Treatment
To detect mRNA transcripts of EVEs and to eliminate erroneous detection of the integrated cotton EVE sequences in the plant genomic DNA, we treated four commercial cotton field composite RNA (S2: symptomatic; S3: asymptomatic; S5: symptomatic; and S6: asymptomatic) and TNA extracted samples with DnaseI (Thermofisher, Waltham, Boston, MA, USA) before cDNA preparation.DNase treatment was performed using <1 µg RNA/TNA and incubated at 37 • C for 30 min.The enzymatic reaction was stopped with 0.5M EDTA (pH 8.0) and incubated at 65 • C for an additional ten minutes.
Before conducting the PCR assay, aliquots of four commercial cotton field composite DNA samples were treated with exonuclease V.This enzyme cleaves linear double-stranded DNA in both 5 ′ and 3 ′ directions, enabling caulimovirus circular episomal DNA detection [30].Similarly, TNA from the seeds of two varieties, DG3615 (n = 10) and ST4595 (n = 10), was treated with exonuclease V to detect the presence of episomal forms of DNA.This treatment was crucial to distinguishing the target episomal DNA from integrated EVE sequences in G. hirsutum species.In this treatment, DNA < 1 µg was mixed with Exonuclease V (RecBCD) (NEB, Ipswich, MA, USA) along with ATP and buffers provided by the manufacturer and incubated at 37 • C for 30 min for linear DNA digestion.Later, 0.5 M EDTA (pH 8.0) was added and incubated at 37 • C for an additional 30 min to stop the enzymatic reaction.Linearized DNA digestion of the commercial cotton field samples (S2, S3, S5, S6) was confirmed by analyzing in 0.8% horizontal agarose gel electrophoresis along with untreated DNA and visualized using the gel documentation system (Analytik Jena UVP UVsolo Touch, Upland, CA, USA).

Virus Detection
cDNA was synthesized using superscript III (Invitrogen) using reverse primers for CLRDV capsid protein and P0 gene, TSV capsid, and movement protein gene, followed by PCR using the gene-specific primer pairs (Table 1).An end-point PCR assay was performed targeting the movement protein gene using a primer pair caulimo movement protein primer pair (Caulimo MP-F & Caulimo MP-R) (Table 1) on Exonuclease V-treated DNA samples to detect the presence of episomal DNA.A DNase-RT-PCR assay was performed on the cDNA of the same samples to detect the mRNA transcripts of EVEs.All samples (n = 100) collected from the experimental plot were screened for CLRDV-targeting partial capsid protein gene primers in RT-qPCR and for caulimovirus movement protein genes in DNase-RT-PCR using the caulimo movement protein primer pair (Table 1).
The sRNA sequence analysis was carried out using the CLC Genomics Workbench (V.23.0.4) (Qiagen, Redwood City, CA, USA).Sequence reads were de novo assembled to create contigs using default parameters.These generated contigs were aligned against the suspected viral sequences using NCBI BLAST.A local virus nucleotide database and a phytoplasma nucleotide database were downloaded on 5 July 2023 from the National Center for Biotechnology Information (NCBI) using the Create Database feature of the CLC Genomics Workbench 23.These contigs were further compared for similarity using the BLASTn [65] tool against all sequences in the database with default parameters set in the CLC Genomics Workbench 23.Sequence reads were mapped with the individual reference sequences of the suspected viruses like CLRDV (NC_014545.1),TSV (KP256522.1),eCPRVE (OR269951) and CotV-A (OR184923).Similarly, in lncRNAs, sequences were trimmed for the adapter and low-quality sequence reads and mapped against CLRDV (NC_014545.1),TSV (KP256522.1),CotV-A (OR184923), and eCPRVE (OR269951) sequences.
Near-complete caulimovirid-like consensus sequences obtained from field samples were compared with the available sequences in NCBI [Accessed in April 2024] of CotV-A, eCPRVE, and other members of the family Caulimoviridae.Multiple sequence alignments were performed using a maximum likelihood algorithm using multiple sequence alignment software, MEGA 11 [66].Among the four EVE sequences obtained, sample PP943202 was used as a reference to mine the plant database, Phytozome (https://phytozome-next.jgi.doe.gov/) [Accessed on 25 May 2024] a Plant Comparative Genomics portal of the Department of Energy's Joint Genome Institute that consists of updated sequenced genomes of cotton species.The query sequence was compared against the available cotton genome database using the BLAST (BLASTN 2.11.0+) option against the following Gossypium species: G. raimondii; G. hirsutum; G. mustelinum_v1_1; G. tomentosum_v1_1; G. barbadense_v1.1;G. hirsutum_v2.1;G. darwinii_v1.1;G. hirsutumUGA230; G. hirsutumUA48; G. hirsutumCSX8308; G. hirsutum; G. hirsutum DeltaPearl; G. hirsutumFM958; G. hirsutum Coker genome.In addition, the query sequence was also compared with G. stephensii (AD7) 'AD701', a genome sequence available in the Cottongen database (https://www.cottongen.org)[Accessed on 25 May 2024] that was not reported earlier.

Symptomatology
During the growing season of 2023, cotton seedlings (2-4 leaf stage) exhibited terminal abortion (Figure 2A,B), leading to profuse vegetative branching (Figure 2C-E).Samples were collected in two commercial fields and the experimental plot from the plants exhibiting deformed leaf lamina, longer petioles, and stunted plants exhibiting profuse vegetative branching due to terminal abortion (Figure 2D,E) and from the asymptomatic plants devoid of such symptoms (Figure 2F).

Symptomatology
During the growing season of 2023, cotton seedlings (2-4 leaf stage) exhibited terminal abortion (Figure 2A,B), leading to profuse vegetative branching (Figure 2C-E).Samples were collected in two commercial fields and the experimental plot from the plants exhibiting deformed leaf lamina, longer petioles, and stunted plants exhibiting profuse vegetative branching due to terminal abortion (Figure 2D,E) and from the asymptomatic plants devoid of such symptoms (Figure 2F).

Virus Detection
PCR analysis of commercial field samples (S2: symptomatic; S3: asymptomatic; S5: symptomatic; and S6: asymptomatic) using primer pairs specific for CLRDV and TSV did not amplify any target genes (Figures S1 and S2).The total DNA extracted from samples (Figure 3A) was treated with exonuclease (Figure 3B) and analyzed in horizontal gel

Virus Detection
PCR analysis of commercial field samples (S2: symptomatic; S3: asymptomatic; S5: symptomatic; and S6: asymptomatic) using primer pairs specific for CLRDV and TSV did not amplify any target genes (Figures S1 and S2).The total DNA extracted from samples (Figure 3A) was treated with exonuclease (Figure 3B) and analyzed in horizontal gel electrophoresis.The treated DNA was further subjected to PCR amplification using the caulimovirus movement protein gene, resulting in an amplicon size of approximately 470 bp from Mitchell County (S5: symptomatic; S6: asymptomatic) (Figure 3C) but not in the samples from Colquitt County (S2: symptomatic; S3: asymptomatic).However, a similar amplicon was obtained from both locations except S3 targeting the caulimovirus movement protein gene in DNase-treated RNA RT-PCR (Figure 3C,D), suggesting nondetectable titers of episomal forms along with low RNA transcripts in S2 but none in S3.CLRDV was detected only in two samples (one asymptomatic and the other symptomatic for terminal abortion) among the total collected samples (n = 100) tested from the experi-mental plot.In the same plot, a total of n = 96 samples, 93% (37/40) of the symptomatic, 98% (39/40) of the asymptomatic, and 95% (19/20) induced terminal abortion samples were positive for EVE detection.Amplicons were further gel-purified, and the sequence was confirmed through Sanger sequencing, matching 98-100% with eCPRVE (OR269951) and CotV-A (OR184923) partial movement protein gene sequence.
bp from Mitchell County (S5: symptomatic; S6: asymptomatic) (Figure 3C) but not in the samples from Colquitt County (S2: symptomatic; S3: asymptomatic).However, a similar amplicon was obtained from both locations except S3 targeting the caulimovirus movement protein gene in DNase-treated RNA RT-PCR (Figure 3C,D), suggesting non-detectable titers of episomal forms along with low RNA transcripts in S2 but none in S3.CLRDV was detected only in two samples (one asymptomatic and the other symptomatic for terminal abortion) among the total collected samples (n = 100) tested from the experimental plot.In the same plot, a total of n = 96 samples, 93% (37/40) of the symptomatic, 98% (39/40) of the asymptomatic, and 95% (19/20) induced terminal abortion samples were positive for EVE detection.Amplicons were further gel-purified, and the sequence was confirmed through Sanger sequencing, matching 98-100% with eCPRVE (OR269951) and CotV-A (OR184923) partial movement protein gene sequence.

Validation of HTS Results
In commercial field samples (n = 54) testing, the caulimovirus movement protein gene was detected in both symptomatic and asymptomatic plants from both locations.The presence of the movement protein gene was confirmed in 13 of the 24 symptomatic samples in Colquitt and 15 of the 22 symptomatic samples from Mitchell County.In contrast, it was detected in all the asymptomatic plants in both locations.(Table 3).

BLAST, Phylogenetic Analysis, and In Silico Mining
Consensus sequences from the lncRNA sequence of the symptomatic samples collected from growers' fields did not exhibit any matches with CLRDV (NC_014545.1) and Ilarviruses, TSV (KP256522.1),when analyzed with the CLC workbench.The sequence matched (98%) with CotV-A and eCPRVE sequences when mapped against the sequences available with NCBI GenBank.In the phylogenetic analysis, the nucleotide sequences of the near-complete sequence from commercial field isolates from GA were 90-98% identical with eCPRVE sequences (OR269936 to OR269951) and the DNA virus CotV-A (OR184923) reported earlier from Mississippi, USA.Further, the EVE sequences from GA are 88% identical to those of the caulimovirus members like plant-associated caulimovirus (OL472131) and grapevine para retrovirus (OP886324).These sequences form a distinct clade from the members of the family Caulimoviridae (Figure 5).In our data mining using EVE query (PP943202) sequences obtained from HTS, we found near-identical integrated sequences in tetraploid species of G.hirsutum cultivars with triplets of high-scoring segment pair (HSPs) of EVEs in A04 chromosome (+/−) with ~7 kb, ~6 kb, and 394 bp lengths showing 97-100% identity (Supplementary Table S1).In addition, we also observed the integrated near-identical EVE sequences in other G. hirsutum chromosomes with various matching lengths and percentage identity (Chromosome-D03 (+/+): ~4 kb with 84% identity, A05 (+/+): ~3 kb with 73% identity, D07 (+/−): 1059 bp with 80% identity, A13 (+/−): 1254 bp with 76% identity).

Cotton Yield Components
In DG3615, the yield components, including seedcotton yield and boll density, demonstrated no significant difference between symptomatic and asymptomatic plants (Table 4).Similarly, in DGH959, seedcotton yield and boll density demonstrated no significant difference when compared between different treatments (symptomatic, asymptomatic, and induced terminal abortion) (Table 4).These results suggest that the terminal abortion symptom observed in the growing season of 2023 did not result in a yield reduction with respect to the varieties tested.

Discussion
In this study, we evaluated the terminal abortion-symptomatic plants that appeared sporadically in the growing season of 2023 in Tift, Mitchell, and Colquitt County, GA. Concurrently, the identification of EVEs in cotton [25,56] solicited the question of whether these elements could play a role in terminal abortion leading to profuse vegetative branching.To enhance our understanding of EVE presence in Georgia-grown cotton, we investigated and detected near-complete sequences (~7.4 kb) in both symptomatic (terminal abortion) and asymptomatic samples.Typically, the terminal abortion of cotton arises from abiotic factors such as wind and hail damage.On occasion, it can also be triggered by biotic elements, including insect feeding, such as tarnished plant bugs commonly found in weed hosts like Palmer amaranth (Amaranthus palmeri S. Watson) [67].Additionally, sucking pests like thrips (Frankliniella fusca Hinds) can induce terminal abortion by feeding on slow-growing cotton seedlings at cold temperatures [68,69].However, our observations in the fields and thrips infestation predictors indicated that the population of thrips was significantly low in the fields during the early weeks of June 2023 [68,70], potentially due to recurrent rainfalls and routine prophylactic management practices [71,72] implemented at the onset of each crop season.
Based on previous studies, there are instances where replication-competent EVEs are induced by various factors such as genome hybridization, tissue culture, abiotic stress, and wounding.This occurrence has been discovered in hosts like banana (endogenous banana streak viruses, eBSVs) [35,36], tobacco (endogenous tobacco vein-clearing virus, eTVCV) [73] and petunia sps.(endogenous petunia vein-clearing virus, ePVCV) [74][75][76].In some cases, EVEs are incapable of autonomous replication due to deficiencies in structural domains, but sometimes their replication is supported by co-infecting viruses from the same family [77].Thus, the samples were tested for the presence of viruses, including CLRDV, TSV, or EVEs (CotV-A and eCPRVE), which were suspected to be the potential causal agents of terminal abortion.The absence of suspected viruses like CLRDV and TSV in the RT-PCR assays of samples collected from commercial cotton fields suggests their noninvolvement during terminal abortion.The absence of CLRDV in the UGA-Bowen farm experimental field, except for one symptomatic (terminal abortion) and one asymptomatic sample, indicates that CLRDV was not prevalent and unlikely to be a causal agent for terminal abortion.Further, in the Bowen farm experimental plot, the EVE detection rate was over 90% across the treatments (symptomatic, asymptomatic, and induced terminal abortion), despite showing differences among them.This supports the hypothesis that EVEs are unlikely to be the causal agent for terminal abortion.
To comprehend the functional status of recently identified cotton EVEs, we analyzed their episomal forms and mRNA transcripts using the protocol of caulimo movement protein gene primers for CotV-A, as described by Ortiz et al. [56].Our analysis revealed the presence of EVEs in the episomal forms in Mitchell County samples but not in Colquitt County.Messenger RNA transcripts of the caulimo movement protein gene were detected in both symptomatic and asymptomatic samples from Mitchell County but only in symptomatic samples from Colquitt County, and non-detectable in the asymptomatic sample from Colquitt County.These results indicate that the formation of episomal DNA from host-integrated sequences may be inconsistent and/or likely a redundant expression and cannot be directly linked to the cause of terminal abortion.As investigated by Squires et al. [78], the episomal DNA expression of the cauliflower mosaic virus (CaMV) infectious clone in Arabidopsis was not temperature-dependent.Similarly, our results corroborate the detection of EVEs' episomal DNA in both symptomatic and asymptomatic field samples exposed to environmental stress (temperature differences) and seeds independent of environmental stress.
In HTS analysis of sRNA and lncRNA sequences, CLRDV and TSV were not detected.However, near-complete, caulimovirid-like sequences were detected for eCPRVE and CotV-A in lncRNAs but not in sRNAs.Commonly, lncRNAs are moderately abundant fractions of eukaryotic transcriptomes (>200 nt) that are lacking coding capacity but are involved in plant gene regulation, and some act as positive or negative regulators of plant immunity [79,80].In comparison, sRNAs are microRNAs (18-40 nt), usually non-coding, and involved in antiviral immunity by guiding argonaut proteins to target viral RNA cleavage [58].HTS assays are widely used for the comprehensive assessment of pathogen profiles.They play a crucial role in the discovery of emerging, reemerging, and mixed viral infections in both cultivated crops and wild plant species [81,82].Many plants respond to exogenous virus infections via transcriptional (TGS) and post-transcriptional gene silencing (PTGS), with PTGS occurring in the cytoplasm and targeting dsRNA intermediates.Although DNA viruses replicate in the nucleus, mRNAs are transported to the cytoplasm for translation or for reverse transcription (pararetorviruses), which makes them potential PTGS targets [83].Despite this, the absence of such sRNAs of EVE sequences related to eCPRVEs, CotV-A, or any persistent viruses [62,63] in high-throughput-based sRNA sequences disproves their involvement in terminal abortion.The nonexistence of viral sRNA sequences in the symptomatic and asymptomatic samples, even in the more sensitive HTS, confirms the absence of an active host defense response against the viruses.This inactivity could be attributed to EVE sequences losing infectivity due to various aspects such as insertions and deletions (indels), mutations, and fragmentation during host genome replication [25].
The detection of near full-length EVE sequences in the cotton samples (symptomatic and asymptomatic) collected from growers' fields in Colquitt and Mitchell counties in GA implies its presence in the varieties of Gossypium hirsutum L. genetic background [25].Moreover, the sequences we obtained consisted of multiple ORFs similar to those submitted by Aboughanem-Sabanadzovic et al. [25], which are capable of encoding various proteins.However, the sequences from GA had stop codons in the open reading frames 5 and 6 (ORFs) coding for putative viral coat protein and reverse transcriptases similar to eCPRVEs as described in Aboughanem-Sabanadzovic et al. [25], making them nonfunctional.Usually, in caulimovirus, mRNA is polycistronic and translated via ribosomal reinitiation [84].Interestingly, no such interruptions due to stop codons were observed in the CotV-A sequence, which requires further evaluation to confirm the functional status of these ORFs [56].
In silico mining of the EVE sequences in the cotton genome database strongly manifests the presence of endogenous viral sequences in various species of Gossypium L., which is not limited to the hirsutum species prominent in North and South Americas but also to the species present in Australia (G.hirsutum CSX8308).Caulimovirus-like near-complete EVE sequences were also found in G. stephensii A04 and D03 chromosomes, showing similar matches to other hirsutum species.This supports the hypothesis of integration predating the speciation events of G. hirsutum, estimated at 0.75 mya, as speculated in Aboughanem-Sabanadzovic et al. [25].However, EVE integrations of approximately ~5 kb with 79% identity were found in Chr A03 in G. tomentosum and about ~5 kb with 70% identity in Chr D07 in G. mustelinum.These integrations were also present in two other species of tetraploids (G.barbadense and G. darwinii) in very minimal-length ~1 kb sequences with 70% identity (Table S1).This shows a high degree of sequence degradation, raising a query about whether the integration event was much earlier (1.80 mya) (Figure 6) than the hirsutum speciation event in the tetraploid "AADD" ancestors or a recent (0.75 mya) multiple independent integrations that needs further investigation.Identifying active mRNA transcripts of ORF3 (movement protein) [25] in EVEs only partially evaluates the entire polycistronic mRNA.A deeper insight into the genomic annotations and functions of other ORFs can significantly enhance our understanding of EVEs in cotton.However, additional investigation is essential to ascertain if episomal DNAs are involved in virion formation and infectivity.Although eCPRVE sequences were discovered in CLRDV-infected cotton plants [25], any correlation (synergistic or antagonistic) between these two viruses has not been established yet.Such clarification on their synergism or antagonism is vital in understanding their role in infection and symptom development within the host plant.In a study of Dahlia variabilis endogenous pararetrovirus sequence (DvEPRS), integrated into the host dahlia (D. variabilis) genome, it was detected in various tissues, including leaves, roots, seeds, flower petals, and pollen, and was capable of 100% seed transmission [27,77,85].Similarly, our results indicate the presence of EVEs in seeds and seedlings, expressed as episomal forms and mRNA transcripts, although their presence in other tissues was not tested.However, the transmission of DvEPRS by mechanical inoculation and through aphids (Myzus persicae) was unsuccessful [77].Further research is imperative to understand such prospects with EVEs in cotton.
To address the concern about the emerging issue of terminal abortion resulting in profuse vegetative branching, yield impact was assessed in two varieties (DG3615 and DGH959) at an experimental plot at the UGA Bowen research farm in Tifton, GA.The In a study of Dahlia variabilis endogenous pararetrovirus sequence (DvEPRS), integrated into the host dahlia (D. variabilis) genome, it was detected in various tissues, including leaves, roots, seeds, flower petals, and pollen, and was capable of 100% seed transmission [27,77,85].Similarly, our results indicate the presence of EVEs in seeds and seedlings, expressed as episomal forms and mRNA transcripts, although their presence in other tissues was not tested.However, the transmission of DvEPRS by mechanical inoculation and through aphids (Myzus persicae) was unsuccessful [77].Further research is imperative to understand such prospects with EVEs in cotton.
To address the concern about the emerging issue of terminal abortion resulting in profuse vegetative branching, yield impact was assessed in two varieties (DG3615 and DGH959) at an experimental plot at the UGA Bowen research farm in Tifton, GA.The results showed no significant difference in the yield response variables: seedcotton yield (DG3615: p = 0.4139, DGH959: p = 0.8866) and boll density (DG3615: p = 0.4933, DGH959: p = 0.7028) between the treatments of two varieties.These findings imply that the terminal abortion leading to profuse vegetative branching observed in the growing season of 2023 did not adversely affect the yield in the varieties tested, which further supports the speculation that terminal abortions may not be concerning at present and the plants are likely to recover to sustain yields.Despite the inconclusive findings of the exact causal agent for terminal abortion, it is worthwhile to explore the cotton genotype response to climate change and increasing temperatures.Consequently, our study provides valuable insights for cotton growers and researchers into the significance of caulimovirid-like EVEs in the cotton genome, paving the way for future research on EVEs to assess their activity and involvement in host interactions.In some cases, putative EVEs may confer host resistance to associated viral infections [86][87][88][89].The discovery of EVE sequences in lncRNAs, but not in small RNAs, prompts the intriguing question of whether they have a role in conferring host immunity, which will be an interesting aspect that needs to be substantiated in future research.

Conclusions
During the 2023 growing season in Georgia, USA, the intermittent appearance of terminal abortion in young cotton plants with no apparent cause raised concerns among industry and academic scientists alike.Therefore, samples were evaluated for multiple aspects to address these concerns.The impact of terminal abortion on cotton yield was mainly assessed, and there was no significant difference between symptomatic and asymptomatic plants.The association between tarnished plant bugs and thrips was also non-significant at the time of symptom appearance.This study further evaluated the presence of EVEs in the cotton genome, finding no correlation between their presence and the occurrence of terminal abortion.Further research and evaluation of cotton EVEs is needed to understand their true functionality and role in pathogenicity or immunity.

Figure 2 .
Figure 2. Terminal abortion symptoms in cotton plants; (A,B) terminal abortion at primary growth stages (2-4 true leaf stage) of cotton seedlings in the field; (C) initial stage of terminal abortion resulting in profuse vegetative branching; (D) symptomatic plant in the experimental plot in Tift County; (E) vegetative branching from Expo-Colquitt and Hopeful-Mitchell counties, GA; (F) asymptomatic plant without any vegetative branching.Photo Credit: S.E., P.C. and S.B.

Figure 2 .
Figure 2. Terminal abortion symptoms in cotton plants; (A,B) terminal abortion at primary growth stages (2-4 true leaf stage) of cotton seedlings in the field; (C) initial stage of terminal abortion resulting in profuse vegetative branching; (D) symptomatic plant in the experimental plot in Tift County; (E) vegetative branching from Expo-Colquitt and Hopeful-Mitchell counties, GA; (F) asymptomatic plant without any vegetative branching.Photo Credit: S.E., P.C. and S.B.

Figure 4 .
Figure 4. Read coverage map at each genome region, showing the maximum, minimum, and average coverage values with the reference sequence.Scaled genome positions of the virus are represented above the histogram and the Y-axis shows the coverage in number of reads.Within the specified peaks, from top to bottom, the colors represent: the maximum coverage (read counts), the average coverage value, and the minimum coverage value.Read coverage map of field samples EVE-GA's (S2: PP943202; S3: PP943203; S4: PP943204; and S5: PP943205) with reference sequences (A) endogenous cotton pararetroviral elements (eCPRVE; OR269951) and (B) cotton virus A (CotV-A; OR184923).Schematic of genome organization of EVE sequences.(C) genome organization of endogenous viral elements, GA (1) compared to a putative eCPRVE (2) and CotV-A (3).Stop codons of the open reading frames (ORFs) coding for putative viral coat protein and reverse transcriptases are shown in red spots.

Figure 4 .
Figure 4. Read coverage map at each genome region, showing the maximum, minimum, and average coverage values with the reference sequence.Scaled genome positions of the virus are represented above the histogram and the Y-axis shows the coverage in number of reads.Within the specified peaks, from top to bottom, the colors represent: the maximum coverage (read counts), the average coverage value, and the minimum coverage value.Read coverage map of field samples EVE-GA's (S2: PP943202; S3: PP943203; S4: PP943204; and S5: PP943205) with reference sequences (A) endogenous cotton pararetroviral elements (eCPRVE; OR269951) and (B) cotton virus A (CotV-A; OR184923).Schematic of genome organization of EVE sequences.(C) genome organization of endogenous viral elements, GA (1) compared to a putative eCPRVE (2) and CotV-A (3).Stop codons of the open reading frames (ORFs) coding for putative viral coat protein and reverse transcriptases are shown in red spots.

Viruses 2024 , 20 Figure 6 .
Figure 6.Graphic representation of the origin of tetraploid Gossypium spp.from their diploid ancestors indicated with timeline and possible virus integration event represented using a green cotton plant and a blue arrow.

Figure 6 .
Figure 6.Graphic representation of the origin of tetraploid Gossypium spp.from their diploid ancestors indicated with timeline and possible virus integration event represented using a green cotton plant and a blue arrow.

Table 1 .
Oligo primer and the targeted virus genes used in this study.

Table 2 .
Long non-coding and small RNAs read coverage, matching, and percent nucleotide identity with different virus sequences suspected in the occurrence of terminal abortion.

Table 3 .
Endogenous viral elements detection in cotton samples from field and greenhouse conditions.
a Sample percentage is rounded off to the nearest decimal; Acronyms used are ND-not detected and NT-not tested.Abbreviation used: EVEs: endogenous viral elements; CLRDV: cotton leafroll dwarf virus; TSV: tobacco streak virus.